Generative models, as an important family of statistical modeling, target learning the observed data distribution via generating new instances. Along with the rise of neural networks, deep generative models, such as variational autoencoders (VAEs) and generative adversarial network (GANs), have made tremendous progress in 2D image synthesis. Recently, researchers switch their attentions from the 2D space to the 3D space considering that 3D data better aligns with our physical world and hence enjoys great potential in practice. However, unlike a 2D image, which owns an efficient representation (i.e., pixel grid) by nature, representing 3D data could face far more challenges. Concretely, we would expect an ideal 3D representation to be capable enough to model shapes and appearances in details, and to be highly efficient so as to model high-resolution data with fast speed and low memory cost. However, existing 3D representations, such as point clouds, meshes, and recent neural fields, usually fail to meet the above requirements simultaneously. In this survey, we make a thorough review of the development of 3D generation, including 3D shape generation and 3D-aware image synthesis, from the perspectives of both algorithms and more importantly representations. We hope that our discussion could help the community track the evolution of this field and further spark some innovative ideas to advance this challenging task.
translated by 谷歌翻译
在呼吸运动下重建肺部锥体束计算机断层扫描(CBCT)是一个长期的挑战。这项工作更进一步,以解决一个具有挑战性的设置,以重建仅来自单个} 3D CBCT采集的多相肺图像。为此,我们介绍了对观点或Regas的概述综合。 Regas提出了一种自我监督的方法,以合成不足的层析成像视图并减轻重建图像中的混叠伪像。该方法可以更好地估计相间变形矢量场(DVF),这些矢量场(DVF)用于增强无合成的直接观察结果的重建质量。为了解决高分辨率4D数据上深神经网络的庞大记忆成本,Regas引入了一种新颖的射线路径变换(RPT),该射线路径转换(RPT)允许分布式,可区分的远期投影。 REGA不需要其他量度尺寸,例如先前的扫描,空气流量或呼吸速度。我们的广泛实验表明,REGA在定量指标和视觉质量方面的表现明显优于可比的方法。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
人员搜索旨在共同本地化和识别来自自然的查询人员,不可用的图像,这在过去几年中在计算机视觉社区中积极研究了这一图像。在本文中,我们将在全球和本地围绕目标人群的丰富的上下文信息中阐述,我们分别指的是场景和组上下文。与以前的作品单独处理这两种类型的作品,我们将它们利用统一的全球本地上下文网络(GLCNet),其具有直观的功能增强。具体地,以多级方式同时增强重新ID嵌入和上下文特征,最终导致人员搜索增强,辨别特征。我们对两个人搜索基准(即Cuhk-Sysu和PRW)进行实验,并将我们的方法扩展到更具有挑战性的环境(即,在MovieIenet上的字符搜索)。广泛的实验结果表明,在三个数据集上的最先进方法中提出的GLCNET的一致性改进。我们的源代码,预先训练的型号,以及字符搜索的新设置可以:https://github.com/zhengpeng7/llcnet。
translated by 谷歌翻译
近年来,由于其表达力和灵活性,神经隐式表示在3D重建中获得了普及。然而,神经隐式表示的隐式性质导致缓慢的推理时间并且需要仔细初始化。在本文中,我们重新审视经典且无处不在的点云表示,并使用泊松表面重建(PSR)的可分辨率配方引入可分化的点对网格层,其允许给予定向的GPU加速的指示灯的快速解决方案点云。可微分的PSR层允许我们通过隐式指示器字段有效地和分散地桥接与3D网格的显式3D点表示,从而实现诸如倒角距离的表面重建度量的端到端优化。因此,点和网格之间的这种二元性允许我们以面向点云表示形状,这是显式,轻量级和富有表现力的。与神经内隐式表示相比,我们的形状 - 点(SAP)模型更具可解释,轻量级,并通过一个级别加速推理时间。与其他显式表示相比,如点,补丁和网格,SA​​P产生拓扑无关的水密歧管表面。我们展示了SAP对无知点云和基于学习的重建的表面重建任务的有效性。
translated by 谷歌翻译
NeRF synthesizes novel views of a scene with unprecedented quality by fitting a neural radiance field to RGB images. However, NeRF requires querying a deep Multi-Layer Perceptron (MLP) millions of times, leading to slow rendering times, even on modern GPUs. In this paper, we demonstrate that real-time rendering is possible by utilizing thousands of tiny MLPs instead of one single large MLP. In our setting, each individual MLP only needs to represent parts of the scene, thus smaller and faster-to-evaluate MLPs can be used. By combining this divide-and-conquer strategy with further optimizations, rendering is accelerated by three orders of magnitude compared to the original NeRF model without incurring high storage costs. Further, using teacher-student distillation for training, we show that this speed-up can be achieved without sacrificing visual quality.
translated by 谷歌翻译
我们研究马尔可夫决策过程(MDP)框架中的离线数据驱动的顺序决策问题。为了提高学习政策的概括性和适应性,我们建议通过一套关于在政策诱导的固定分配所在的分发的一套平均奖励来评估每项政策。给定由某些行为策略生成的多个轨迹的预收集数据集,我们的目标是在预先指定的策略类中学习一个强大的策略,可以最大化此集的最小值。利用半参数统计的理论,我们开发了一种统计上有效的策略学习方法,用于估算DE NED强大的最佳政策。在数据集中的总决策点方面建立了达到对数因子的速率最佳遗憾。
translated by 谷歌翻译
我们在无限地平线马尔可夫决策过程中考虑批量(离线)策略学习问题。通过移动健康应用程序的推动,我们专注于学习最大化长期平均奖励的政策。我们为平均奖励提出了一款双重强大估算器,并表明它实现了半导体效率。此外,我们开发了一种优化算法来计算参数化随机策略类中的最佳策略。估计政策的履行是通过政策阶级的最佳平均奖励与估计政策的平均奖励之间的差异来衡量,我们建立了有限样本的遗憾保证。通过模拟研究和促进体育活动的移动健康研究的分析来说明该方法的性能。
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译